Skip to content

perf(physical-plan): optimize byte view append#21586

Open
kumarUjjawal wants to merge 2 commits intoapache:mainfrom
kumarUjjawal:feat/vectorized_append
Open

perf(physical-plan): optimize byte view append#21586
kumarUjjawal wants to merge 2 commits intoapache:mainfrom
kumarUjjawal:feat/vectorized_append

Conversation

@kumarUjjawal
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

ByteViewGroupValueBuilder::vectorized_append did extraper-value work. It rebuilt views and copied long values one by one even when the input already had reusable byte views.

What changes are included in this PR?

  • Reuse input views in vectorized_append
  • Use extend on the no-null fast path
  • Batch-copy long values from source buffers when rows are contiguous in the same input buffer
  • Keep correct handling for nulls, repeated rows, and out-of-order rows
  • Add tests for:
    • subset and repeated rows
    • take_n after vectorized append
    • multiple long batches across calls
    • mid-append flushes
    • oversized values

Are these changes tested?

Yes

Are there any user-facing changes?

No

@github-actions github-actions bot added the physical-plan Changes to the physical-plan crate label Apr 13, 2026
}

let start_idx = self.views.len();
self.views.extend(rows.iter().map(|&row| source_views[row]));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be moved at line 202 and this will re-use the iteration over rows.

let start_idx = self.views.len();
self.views.extend(rows.iter().map(|&row| source_views[row]));

let mut pending = Vec::with_capacity(rows.len());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This most probably will over-allocate because only the rows with more than 12 bytes will be appended. Maybe use half of the rows length as initial capacity and let it grow if needed ?!

rows: &[usize],
) {
let source_views = array.views();
let mut pending = Vec::with_capacity(rows.len());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same - consider using a smaller initial capacity.

Copy link
Copy Markdown
Contributor Author

@kumarUjjawal kumarUjjawal Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think half or rows.len() would be good as you said.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize ByteViewGroupValueBuilder vectorized_append

2 participants